In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier, symbol, constant, procedure and Subroutine in a program's source code is associated with information relating to its declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry's corresponding symbol.
A compiler may use one large symbol table for all symbols or use separated, or hierarchical symbol tables for different scopes. For example, in a scoped language such as ALGOL or PL/I a symbol "p" can be declared separately in several procedures, perhaps with different attributes. The scope of each declaration is the section of the program in which references to "p" resolve to that declaration. Each declaration represents a unique identifier "p". The symbol table must have some means of differentiating references to the different "p"s.
A common data structure used to implement symbol tables is the hash table. The time for searching in hash tables is independent of the number of elements stored in the table, so it is efficient for a large number of elements. It also simplifies the classification of literals in tabular format by including the classification in calculation of the hash key.
As the lexical analyser spends a great proportion of its time looking up the symbol table, this activity has a crucial effect on the overall speed of the compiler. A symbol table must be organised in such a way that entries can be found as quickly as possible. Hash tables are usually used to organise a symbol table, where the keyword or identifier is 'hashed' to produce an array subscript. hash collision are inevitable in a hash table, and a common way of handling them is to store the synonym in the next available free space in the table.
While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.
// Define a public function double foo(int count) {
double sum = 0.0;
// Sum all the values bar(1) to bar(count) for (int i = 1; i <= count; i++) sum += bar((double) i); return sum;}
A C compiler that parses this code will contain at least the following symbol table entries:
extern |
function parameter |
global |
function parameter |
block local |
for-loop statement |
In addition, the symbol table may also contain entries generated by the compiler for intermediate expression values (e.g., the expression that casts the bar loop variable into a x, and the return value of the call to function foo), statement labels, and so forth.
+Example table: SysV ABI |
T_BIT |
F_BIT |
I_BIT |
irqvec |
fiqvec |
InitReset |
_main |
End |
AT91F_US3_CfgPIO_useB |
AT91F_PIO_CfgPeriph |
Entry point |
AT91F_DBGU_Printk |
AT91F_US_TxReady |
AT91F_US_PutChar |
AT91F_SpuriousHandler |
AT91F_DataAbort |
AT91F_FetchAbort |
AT91F_Undef |
AT91F_UndefHandler |
AT91F_LowLevelInit |
AT91F_DBGU_CfgPIO |
AT91F_PIO_CfgPeriph |
AT91F_US_Configure |
AT91F_US_SetBaudrate |
AT91F_US_Baudrate |
AT91F_US_SetTimeguard |
AT91F_PDC_Open |
AT91F_PDC_DisableRx |
AT91F_PDC_DisableTx |
AT91F_PDC_SetNextTx |
AT91F_PDC_SetNextRx |
AT91F_PDC_SetTx |
AT91F_PDC_SetRx |
AT91F_PDC_EnableRx |
AT91F_PDC_EnableTx |
AT91F_US_EnableTx |
__aeabi_uidiv |
__udivsi3 |
__aeabi_uidivmod |
__aeabi_idiv0 |
__aeabi_ldiv0 |
__div0 |
_data |
_etext |
__bss_end__ |
__bss_start |
__bss_start__ |
_edata |
_end |
An example of a symbol table can be found in the SysV Application Binary Interface (ABI) specification, which mandates how symbols are to be laid out in a binary file, so that different compilers, linkers and loaders can all consistently find and work with the symbols in a compiled object.
The SysV ABI is implemented in the GNU binutils' nm utility. This format uses a sorted memory address field, a "symbol type" field, and a symbol identifier (called "Name").
The symbol types in the SysV ABI (and nm's output) indicate the nature of each entry in the symbol table. Each symbol type is represented by a single character. For example, symbol table entries representing initialized data are denoted by the character "d" and symbol table entries for functions have the symbol type "t" (because executable code is located in the text section of an object file). Additionally, the capitalization of the symbol type indicates the type of linkage: lower-case letters indicate the symbol is local and upper-case indicates external (global) linkage.
Both the LISP and the Scheme programming languages allow arbitrary, generic properties to be associated with each symbol.Symbols - Guile Documentation
The Prolog programming language is essentially a symbol-table manipulation language; symbols are called atoms, and the relationships between symbols can be reasoned over. Similarly, OpenCog provides a dynamic symbol table, called the atomspace, which is used for knowledge representation.
|
|